Skip to content

add array element mutex offset in print and gc #58997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Jul 14, 2025

The layout and printing logic need to correctly offset and align the inset fields to account for the per-element mutex of an atomic array with large elements.

Fix #58993

@vtjnash vtjnash requested a review from gbaraldi July 14, 2025 17:16
@vtjnash vtjnash added multithreading Base.Threads and related functionality backport 1.12 Change should be backported to release-1.12 labels Jul 14, 2025
@oscardssmith oscardssmith added the bugfix This change fixes an existing bug label Jul 14, 2025
@gbaraldi gbaraldi added the needs tests Unit tests are required for this change label Jul 14, 2025
@oscardssmith
Copy link
Member

Does this also need 1.11 backport?

@gbaraldi
Copy link
Member

julia> struct CxRL
           condvar::Condition
           x::Int
           lock::ReentrantLock
           CxRL() = new(Condition(), 5, ReentrantLock())
       end

julia> struct OnceHolder
           once::OncePerThread{CxRL}
           OnceHolder() = new(OncePerThread{CxRL}(CxRL))
       end

julia> holder = OnceHolder()
OnceHolder(OncePerThread{CxRL, Type{CxRL}}(CxRL[], UInt8[], CxRL))

julia> if length(ARGS) > 0
           display(holder)
       end

julia> holder.once()
CxRL(Condition(), 5, ReentrantLock())

julia> holder.once.xs[2]

This still hangs

@vtjnash
Copy link
Member Author

vtjnash commented Jul 15, 2025

Alright, now includes the additional revert required to fix that bug too.

@gbaraldi
Copy link
Member

The only issue with reverting that is a performance regression due to llvm's cost modeling pessimizing vectorization if the GEP calculation happens outside the GEP (it's their fault and maybe they will fix this in the future but this makes a bunch of operations regress) #57389

@vtjnash vtjnash added merge me PR is reviewed. Merge when all tests are passing and removed needs tests Unit tests are required for this change labels Jul 17, 2025
@gbaraldi gbaraldi removed the merge me PR is reviewed. Merge when all tests are passing label Jul 18, 2025
@gbaraldi
Copy link
Member

According to #59030 (comment) this doesn't fix the issue

@@ -4820,9 +4820,10 @@ static jl_cgval_t emit_memoryref(jl_codectx_t &ctx, const jl_cgval_t &ref, jl_cg
setName(ctx.emission_context, ovflw, "memoryref_ovflw");
}
#endif
Type *elty = isboxed ? ctx.types().T_prjlvalue : julia_type_to_llvm(ctx, jl_tparam1(ref.typ));
Copy link
Member

@gbaraldi gbaraldi Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to implement this? I guess julia_type_to_llvm doesn't make the lock FCA? As I said this will pessimize vectorization a lot

Copy link
Member Author

@vtjnash vtjnash Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is not. This is why GEP was such a big mistake that is slowly getting replaced by ptradd upstream. (I realize this reverts #57389, including its copy in #58768)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the fact that this pessimizes vectorization is just that LLVM models complex GEPS differently from muls/shifts (even though the backend will probably generate similar code (it shouldn't be hard to pattern match)

Copy link

@bbrehm bbrehm Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a potential problem for the vectorizer, can't we just use the old GEP code if the memory is not atomic?

Performance with locks is abysmal anyways; the remaining special case where the GEP code is invalid is stuff like NTuple{3,Int8} that must be represented as 4 bytes for AtomicMemory.

In view of current experience with llvm refusing to optimize atomics, I view it as unlikely that the autovectorizer dares to touch them, and we can revisit that in the future if anybody complains.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds like a good idea for now

src/rtutils.c Outdated
@@ -1290,10 +1290,20 @@ static size_t jl_static_show_x_(JL_STREAM *out, jl_value_t *v, jl_datatype_t *vt
for (size_t j = 0; j < tlen; j++) {
if (layout->flags.arrayelem_isboxed) {
jl_value_t **ptr = ((jl_value_t**)m->ptr) + j;
if (layout->flags.arrayelem_islocked) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait a second, boxed elements should never be locked, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I think claude might have been too eager to use its new bit everywhere

@KristofferC KristofferC mentioned this pull request Jul 22, 2025
20 tasks
vtjnash added 2 commits July 22, 2025 16:12
The layout and printing logic need to correctly offset and align the
inset fields to account for the per-element mutex of an atomic array
with large elements.

Fix #58993
Fix #59030
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.12 Change should be backported to release-1.12 bugfix This change fixes an existing bug multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory corruption, segmentation fault, with OncePerThread, Condition, ReentrantLock on v1.12.0-rc1, nightly
5 participants